Type - based and Token - based Learning of Kanji Morphemes
نویسنده
چکیده
We have been developing methods of kanji morpheme analysis for the empirical modelling of terminology. In this paper we discuss the performance of kanji morpheme extraction and kanji sequence decomposition, both based on the same bigram statistics, focusing on the e ect of type-based and token-based trainings. The experiment shows that type-based training gives consistently better performance, which has both practical and theoretical importance.
منابع مشابه
A Kana-Kanji Translation System for Non-Segmented Input Sentences Based on Syntactic and Semantic Analysis
This paper presents a disambiguation approach for t ransla t ing non-segmented-Kana into Kanji. The method consists of two steps. In the first step, an input sentence is analyzed morphologically and ambiguous morphemes are stored in a network form. In the second step, the best path, which is a string of morphemes, is selected by syntactic and semantic analysis based on case grammar. In order to...
متن کاملAn Improved Token-Based and Starvation Free Distributed Mutual Exclusion Algorithm
Distributed mutual exclusion is a fundamental problem of distributed systems that coordinates the access to critical shared resources. It concerns with how the various distributed processes access to the shared resources in a mutually exclusive manner. This paper presents fully distributed improved token based mutual exclusion algorithm for distributed system. In this algorithm, a process which...
متن کاملComposition and Decomposition of Japanese Katakana and Kanji Morphemes for Decision Rule Induction from Patent Documents
We propose a new method to construct a word list for rule induction from Japanese patent documents. For word segmentation in Japanese, statistical morphological analyzers have been used in many applications. However, the output of these morphological analyzers presents defects when analyzing unknown words, specifically words that contain Kanji/Katakana morphemes. Some words are overly segmented...
متن کاملAudio-Based Learning of Kanji 1 Running head: An Audio-Based Approach to Mobile Learning of Japanese Kanji Characters An Audio-Based Approach to Mobile Learning of Japanese Kanji Characters
We describe the design and implementation of an audio-based computer system for mobile, nonvisual learning of the meaning and writing of "kanji" characters: the thousands of multi-stroke Chinese characters used in the Japanese logographic writing system. Our system is designed for use by non-native learners of Japanese as a foreign language. The key feature of our system is its innovative use o...
متن کاملA Morpho-Syntactic Analyzer of Controlled Japanese
The proposed morpho-syntactic analyzer parses controlled Japanese texts such as articles in newspapers, technical magazines and professional journals and public documents that are transcribed wherever applicable by using Joyo Kanji (frequently used Chinese characters). The analyzer parses sentences in controlled Japanese texts into morpho-syntactic units, further dividing them into the content ...
متن کامل